Focused Crawling using Asynchronous Cellular Learning Automata

نویسنده

  • M. R. Meybodi
چکیده

Web crawling is used to collect the web pages which will be indexed by a search engine. The search engine uses these crawled and indexed pages to answer users’ queries. Since the volume of web pages is very high and it increases continuously, search engines can index a limited number of web pages. Therefore, in recent years, the focused crawler algorithms have been introduced which act selectively during crawling and collect the web pages related to a specific topic. In this paper, an asynchronous cellular learning automata based approach for focused crawling is proposed. The proposed approach is a combination of web structure and web usage mining techniques and is composed of two phases. In the first phase the relationship structure of pages is determined using asynchronous cellular learning automata, hyperlinks and users’ behavior in visiting web pages, i.e. the related pages and their relevance degree are determined. In the second phase, the focused crawling is performed using the obtained relationship structure and the pages related to a specific topic are collected. Experimental results have shown the superiority of the proposed method (harvest rate and target recall) in comparison to Best First Crawler and its independency from initial set selection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Frog Leaping Algorithm Using Cellular Learning Automata

In this paper, a new algorithm which is the result of the combination of cellular learning automata and frog leap algorithm (SFLA) is proposed for optimization in continuous, static environments.At the proposed algorithm, each memeplex of frogs is placed in a cell of cellular learning automata. Learning automata in each cell acts as the brain of memeplex, and will determine the strategy of moti...

متن کامل

Asynchronous cellular learning automata

Cellular learning automata is a combination of cellular automata and learning automata. The synchronous version of cellular learning automata in which all learning automata in different cells are activated synchronously, has found many applications. In some applications a type of cellular learning automata in which learning automata in different cells are activated asynchronously (asynchronous ...

متن کامل

A Cellular Learning Automata (CLA) Approach to Job Shop Scheduling Problem

Job shop scheduling problem (JSSP), as one of the NP-Hard combinatorial optimization problems, has attracted the attention of many researchers during the last four decades. The overall purpose regarding this problem is to minimize maximum completion time of jobs, known as makespan. This paper addresses an approach to evolving Cellular Learning Automata (CLA) in order to enable it to solve the J...

متن کامل

Evolving Robust Asynchronous Cellular Automata for the Density Task

In this paper the evolution of three kinds of asynchronous cellular automata are studied for the density task. Results are compared with those obtained for synchronous automata and the influence of various asynchronous update policies on the computational strategy is described. How synchronous and asynchronous cellular automata behave is investigated when the update policy is gradually changed,...

متن کامل

Phase Space Invertible Asynchronous Cellular Automata

While for synchronous deterministic cellular automata there is an accepted definition of reversibility, the situation is less clear for asynchronous cellular automata. We first discuss a few possibilities and then investigate what we call phase space invertible asynchronous cellular automata in more detail. We will show that for each Turing machine there is such a cellular automaton simulating ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008